How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

نویسندگان

  • Chia-Wei Liu
  • Ryan Lowe
  • Iulian Serban
  • Michael Noseworthy
  • Laurent Charlin
  • Joelle Pineau
چکیده

We investigate evaluation metrics for endto-end dialogue systems where supervised labels, such as task completion, are not available. Recent works in end-to-end dialogue systems have adopted metrics from machine translation and text summarization to compare a model’s generated response to a single target response. We show that these metrics correlate very weakly or not at all with human judgements of the response quality in both technical and non-technical domains. We provide quantitative and qualitative results highlighting specific weaknesses in existing metrics, and provide recommendations for future development of better automatic evaluation metrics for dialogue systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech understanding, dialogue management and response generation in corpus-based spoken dialogue system

This paper presents construction of a spoken dialogue system using a large-scale spoken dialogue corpus with intention tags. In this system, all of main components, such as speech understanding, dialogue management, and response generation, are constructed with corpus-based methods. An evaluation experiment using a test set has shown that the performance of the corpus-based dialogue system is i...

متن کامل

On-Line Learning of a Persian Spoken Dialogue System Using Real Training Data

The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...

متن کامل

On-Line Learning of a Persian Spoken Dialogue System Using Real Training Data

The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...

متن کامل

Some empirical findings on dialogue management and domain ontologies in dialogue systems - Implications from an evaluation of BirdQuest

In this paper we present implications for development of dialogue systems, based on an evaluation of the system BIRDQUEST which combine dialogue interaction with information extraction. A number of issues detected during the evaluation concerning primarily dialogue management, and domain knowledge representation and use are presented and discussed.

متن کامل

Empirical Evaluation of a Reinforcement Learning Spoken Dialogue System

We report on the design, construction and empirical evaluation of a large-scale spoken dialogue system that optimizes its performance via reinforcement learning on human user dialogue data.

متن کامل

Ucg Used by Response Generation

The paper deals with a spoken dialogue system component – response generation module. We are developing the spoken dialogue system called CIC (city information centre) providing a subset of services of a real city information centre. The main focus of this article is an experiment with usage of UCG (Unification Categorial Grammar) for response generation within a dialogue system speaking Czech....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016